LS4003 R worksheet 2

Sleep quality

Introduction to the data

For this worksheet, you will need the Sleep_health_and_lifestyle_dataset.csv file from the Canvas page.

This dataset contains values for sleep quality and lifestyle factors. This is artificial data generated for illustrative purposes

This dataset contains the following information:

Sleep data columns and descriptions
Column Data
Age Age of the individual
Sex Sex of the individual (Male/Female)
Occupation Occupation of the individual
sleepduration Average length of sleep (hours)
sleepquality Average sleep quality score
activitylevel Value assigned based on average level of physical activities (minutes/day)
stress Self-rated score of how stressed the individual feels
heartrate Resting heartrate of individual (beats per minute)
steps Average number of steps taken per day

© [Rudzhan] / Adobe Stock

© [Rudzhan] / Adobe Stock

The task

The task for this worksheet is to determine if there are any correlations in the dataset and what they are.

You should plot any correlations you find on a graph, such as the one below:

Example of generated graph

A graph showing the correlation between sleep duration and resting heart rate, separated by sex.

There are lots of different variables in this dataset so do explore - what can you find out?

Extension task

Correlation doesn’t always equal causation - and Spurious correlations has many examples to prove this.

If you go to the website and click on a correlation, you should get a table with the raw data.

You can then: 1. Copy and paste this table into excel 2. Save your excel spreadsheet as a CSV file 3. Read the CSV in R 4. Plot and test the correlations

Note

It may not be as straightforward as you’d think to copy the data from the table and get this into R.

You might find that R doesn’t like spaces in column or row titles - or that your columns and rows are the wrong way around.